Development and validation of novel models for the prediction of intravenous corticosteroid resistance in acute severe ulcerative colitis using logistic regression and machine learning

Abstract Background The early prediction of intravenous corticosteroid (IVCS) resistance in acute severe ulcerative colitis (ASUC) patients remains an unresolved challenge. This study aims to construct and validate a model that accurately predicts IVCS resistance. Methods A retrospective cohort was established, with consecutive inclusion of patients who met the diagnosis criteria of ASUC and received IVCS during index hospitalization in Peking Union Medical College Hospital between March 2012 and January 2020. The primary outcome was IVCS resistance. Classification models, including logistic regression and machine learning-based models, were constructed. External validation was conducted in an independent cohort from Shengjing Hospital of China Medical University. Results A total of 129 patients were included in the derivation cohort. During index hospitalization, 102 (79.1%) patients responded to IVCS and 27 (20.9%) failed; 18 (14.0%) patients underwent colectomy in 3 months; 6 received cyclosporin as rescue therapy, and 2 eventually escalated to colectomy; 5 succeeded with infliximab as rescue therapy. The Ulcerative Colitis Endoscopic Index of Severity (UCEIS) and C-reactive protein (CRP) level at Day 3 are independent predictors of IVCS resistance. The areas under the receiver-operating characteristic curves (AUROCs) of the logistic regression, decision tree, random forest, and extreme-gradient boosting models were 0.873 (95% confidence interval [CI], 0.704–1.000), 0.648 (95% CI, 0.463–0.833), 0.650 (95% CI, 0.441–0.859), and 0.604 (95% CI, 0.416–0.792), respectively. The logistic regression model achieved the highest AUROC value of 0.703 (95% CI, 0.473–0.934) in the external validation. Conclusions In patients with ASUC, UCEIS and CRP levels at Day 3 of IVCS treatment appeared to allow the prompt prediction of likely IVCS resistance. We found no evidence of better performance of machine learning-based models in IVCS resistance prediction in ASUC. A nomogram based on the logistic regression model might aid in the management of ASUC patients.


Introduction
Acute severe ulcerative colitis (ASUC) is a potentially lifethreatening medical emergency that requires timely recognition and intervention [1]. A total of 15%-25% of patients with ulcerative colitis (UC) will need hospitalization for an acute severe flare of disease in their natural history [2]. Intravenous corticosteroids (IVCS) are the first-line therapy for ASUC [3][4][5]. However, $30% of patients may become IVCS-resistant and require second-line therapy. Approximately 25%-30% of patients need short-term colectomy [4]. Delays in rescue therapy are associated with higher morbidity and mortality rates [6,7]. Therefore, prompt and accurate prediction of IVCS resistance in ASUC patients is of great importance.
Many proxies have been proposed in IVCS response prediction, including the severity of clinical symptoms, laboratory biomarkers, and composite scoring systems. The erythrocyte sedimentation rate (ESR) [8,9] and albumin (Alb) [10,11] at Day 1 of IVCS treatment might indicate steroid failure and C-reactive protein (CRP) at Day 3 [12,13] could predict colectomy, according to a previous study. However, conflicting results have also been reported [14][15][16][17]. The predictive value of endoscopy has been one of the research focuses, especially with the emergence of reproducible scores of endoscopic severities. However, among the widely recognized composite scoring systems in ASUC patients, such as the Travis score [17], Ho score [10], and Lindgren score [18], endoscopic features are not incorporated. There is still an unsatisfactory demand for predicting the outcomes of IVCS treatment in patients with ASUC.
Traditional approaches, such as logistic regression (LR), have long been utilized in disease outcome prediction. In the past decade, artificial intelligence has made its way into many medical domains, given the increasingly big data in electronic health records, imaging, and multiomics [19]. A recent systematic review comprehensively synthesized and appraised machine learning (ML)-based prediction models in inflammatory bowel diseases [20]. However, there is an insufficient number of ML-based models addressing the outcome prediction in patients with ASUC and the results regarding IVCS resistance are scarce [21,22].
This study aimed to assess the predictive value of clinical, laboratory, and endoscopic parameters and develop novel predictive models for short-term outcomes in patients with ASUC. The traditional LR approach and a series of ML-based algorithms will be used in model development. External validations of the selected models will be conducted.

Patients
This is a retrospective cohort study. A derivation cohort was recruited between March 2012 and January 2020 at Peking Union Medical College Hospital (Beijing, China). Patients who met the diagnosis criteria of ASUC (modified Truelove and Witts criteria: >6 bloody stools per day and systemic toxicity with at least one of temperature of >37.8 C, pulse of >90 bpm, hemoglobin of <105 g/L, or CRP of >30 mg/L) [3] and received IVCS (hydrocortisone at 300-400 mg/day or methylprednisolone at 60-80 mg/ day) during index hospitalization were included. Patients confirmed as having Crohn's disease during follow-up and patients with incomplete data of endoscopic and laboratory information were excluded. An independent validation cohort of the same time frame was recruited from Shengjing Hospital of China Medical University (Liaoning, China). A uniform set of criteria was used to build the two cohorts. Research approval was obtained from the Ethics Committees of Peking Union Medical College Hospital (approval no. S-K1723) and Shengjing Hospital of China Medical University (approval no. 2022PS756K). All patients provided informed consent. The study conformed with the principles in the Declaration of Helsinki.

Predictor variables
A total of 13 demographic and clinical factors, including age, sex, duration of disease, hospital stay, stool frequency, concomitant infections, Montreal classification of disease extent [4], medication history, and extra-intestinal manifestations, were recorded during index hospitalization. Concomitant cytomegalovirus (CMV) infection was defined if CMV inclusion bodies or positive CMV-specific immunohistochemistry was identified or blood CMV DNA was detected by quantitative polymerase chain reaction (qPCR) within a week before and a week after the initiation of IVCS treatment [23]. Clostridium difficile infection was defined as the presence of C. difficile toxin A/B or the C. difficilespecific gene tpi and toxin gene (tcdA/tcdB) identified by polymerase chain reaction (PCR) within a week before and after the initiation of IVCS treatment [24]. Laboratory data at admission and on the third day of IVCS treatment were recorded as 14 continuous factors (absolute counts of white blood cells [WBC], neutrophils, hemoglobin level [Hgb], platelet count [PLT], CRP, ESR, and Alb). Endoscopic predictors before initiating IVCS treatment, including Mayo score, Ulcerative Colitis Endoscopic Index of Severity (UCEIS) score, luminal narrowing, and rectal sparing, were obtained through an external post hoc assessment by blinded inflammatory bowel disease (IBD) specialists.

Definition of outcomes
The primary outcome was IVCS resistance; IVCS resistance was defined as the requirement for rescue therapy during the index hospitalization, including medical therapy and surgery. No colectomy during the index hospitalization and within 3 months after the index hospitalization, including colectomy after IVCS or after the failure of medical rescue therapy during the same exacerbation, was defined as colectomy-free.

Statistical analysis
Statistical analysis was performed using R (Version 4.0.5; R Core Team, 2021). Continuous variables with a non-Gaussian distribution are expressed as the median and interquartile range (IQR) and were compared using the Mann-Whitney U test. Categorical data are presented as counts and percentages and were compared using Fisher's exact test. Univariate analyses were first performed to identify predictors of short-term outcomes. Multivariable analysis was performed to determine the independent effects.

Model development and validation
For the LR model, the least absolute shrinkage and selection operator (LASSO) method was used to help optimize feature selection and minimize overfitting. For the ML-based models, the package caret and Boruta algorithm were used for hyperparameter optimization. Models based on classifiers, including decision tree (DT), random forest (RF), and extreme-gradient boosting (XGB), were constructed. As has been published [25,26], the discovery data set was randomly split into a 70% training set and a 30% testing set. The division procedure was replicated 100 times and the area under the receiver-operating characteristic curve (AUROC) was calculated in each split. The mean AUROC was obtained and one split with the representative AUROC that was closest to the mean AUROC was selected and presented. Predictor variable importance based on the RF model and the XGB model was trained on the entire data set.
To comprehensively evaluate the selected model, calibration curves were plotted to assess the calibration. Decision curve analysis was conducted to determine the clinical usefulness and net benefit. A predicting nomogram of the LR model was eventually established. The external validity of the selected models was confirmed with data from the validation cohort.

Patient characteristics
A total of 196 patients with UC were hospitalized and screened. Among these patients, 2 were diagnosed with Crohn's colitis during follow-up and 31 were excluded for lack of endoscopic and laboratory data; 34 patients did not rigorously meet the modified Truelove and Witts criteria for ASUC. Eventually, 129 patients with ASUC were included in the analysis ( Table 1). The median age at admission was 40 years (IQR 30-51). Sixty-two (48.1%) patients were female. Twelve (9.3%) patients were newly diagnosed with UC and the others suffered from recurrence. The median duration of UC was 2 years (IQR 1.0-6.0). A total of 121 (93.8%) patients had extensive UC (E3) and 8 (6.2%) patients had left-sided UC (E2). Sixty-nine (53.5%) of the patients had received treatments for UC before admission, including oral or systemic corticosteroids (67, 51.9%), immunosuppressants (10, 7.8%), and biologics (7, 5.4%). Of the seven patients who had been treated with biologics before hospitalization, two received adalimumab and five received infliximab. Six cases of extraintestinal manifestations at admission were reported, including primary biliary cholangitis (n ¼ 1), ankylosing spondylitis (n ¼ 2), and venous thrombosis (n ¼ 3: 2 cases of upper extremity superficial vein thrombosis and 1 case of lower extremity intermuscular vein thrombosis).
Compared with patients who were eventually resistant to IVCS, the IVCS responders had a shorter duration of IVCS use  Figure 1A and B) were significantly higher in the non-responders, as well as the severity scores of endoscopic performances, including the Mayo scores (percentage of Mayo scores ¼ 3, 80.4% vs 100%, P ¼ 0.007) and UCEIS scores (6.00 [5.25-7.00] vs 7.00 [7.00-8.00] points [pts], P < 0.001; Table 1). Distinct patterns of endoscopic features were observed in patients with different outcomes ( Figure 1C and D). Of the three descriptive factors constituting the UCEIS score, vascular patterns were similar between groups, whereas bleeding and erosions and ulcers were more severe in IVCS-resistant patients and patients who needed colectomy. Full characterization of the cohort is provided in Supplementary Table 1.

Clinical outcomes
During index hospitalization, 102 (79.1%) ASUC patients responded to IVCS, whereas 27 (20.9%) patients were resistant to IVCS. The rescue therapy was cyclosporin in 6 patients, infliximab in 5 patients, and colectomy in 16 (12.4%) patients. Two of the patients treated with cyclosporin eventually needed colectomy. Patients treated with infliximab were all free of colectomy. The colectomy rate within 3 months after admission was 14.0% (n ¼ 18), with a median duration of 28 (IQR, 20.0-34.0) days.

Predictors of short-term outcomes
Univariate analysis was performed to determine factors that were associated with IVCS resistance within 3 months. PLT at admission, serum level of CRP at admission and Day 3 of IVCS treatment, prior steroid use, and UCEIS scores were potential factors associated with IVCS resistance within 3 months (all P < 0.1; Table 2; univariate analysis of all the factors shown in Supplementary Table 2). Patients who received colectomy within 3 months were significantly older at the episode of ASUC (P ¼ 0.033). By multivariate analysis, UCEIS score (odds ratio [OR], 5.67; 95% confidence interval [CI], 2.34-13.72, P < 0.001) and CRP at Day 3 of IVCS treatment (OR, 1.05; 95% CI, 1.02-1.08, P ¼ 0.001) were identified as independent predictors of IVCS resistance.

Existing scoring system and clinical outcomes
Three widely recognized indices for patients with ASUC, the Travis [17], Ho [10], and Lindgren [18] scores, were calculated (Supplementary Table 3

Development and evaluation of the novel predictive models
Development of the predictive models A total of 28 variables were reduced to 2 potential predictors by LASSO regression analysis (Figure 2A and B), UCEIS, and CRP at Day 3 of IVCS treatment, which were incorporated into the LR model (b0 ¼ À12.871, b for UCEIS ¼ 1.543, b for CRP at Day 3 ¼ 0.053). Three features, including CRP at Day 3 of >34.6 mg/L, UCEIS score of >6.5 pts, and CRP at admission of >103.1 mg/L, entered the DT model. The top five important predictors in the RF model were CRP at Day 3, UCEIS score, CRP at admission, WBC at Day 3, and PLT at admission ( Figure 2C). The top five important predictors in the XGB model were CRP at Day 3, UCEIS score, neutrophils at Day 3, Hgb at admission, and ESR at Day 3 ( Figure 2D).

Pragmatic model and nomogram
Synthetically considered among the predictive performances and clinical availability, the LR model was selected as the final model for IVCS resistance prediction. The calibration curve of the LR model demonstrated good agreement in the retrospective cohort ( Figure 3C). As presented in Figure 3D, the decision curve analysis graphically shows the clinical usefulness of the predicting model based on a continuum of potential thresholds for IVCS resistance risk (x-axis) and the net benefit of using the model to risk stratify patients (y-axis) relative to assuming that no patient will be IVCS-resistant. A nomogram based on the selected model was constructed ( Figure 3E).

Discussion
Accurate risk assessment of IVCS resistance in patients with ASUC is paramount since delaying the initiation of rescue therapy is a recognized determinant of morbidity and mortality [6]. Biomarkers and composite scoring systems might help predict outcomes in ASUC. However, the roles of individual biomarkers require further clarification and there remains considerable uncertainty over the utility and preferences of the scoring system [27]. An effective predictive tool could help physicians identify patients requiring escalation to reduce unnecessary steroid exposure and improve prognosis.
In this study, we used the traditional LR model and ML approaches to generate predictive models of IVCS resistance in patients with ASUC. The LR model showed good accuracy in internal validation. In the validation cohort, the LR model showed an AUROC of 0.703 (95% CI, 0.473-0.934), suggesting satisfactory generalization capability. The DT, RF, and XGB models, by contrast, performed worse and were relatively inaccurate in both the internal and external validation. In addition, the LR model showed better performance than the Travis, Ho, and Lindgren scores. All three scores incorporate stool frequency, which could be inaccurate due to information bias. In addition, none of the scores incorporated endoscopic findings, which are clearly a parameter worthy of careful consideration as a predictive tool. In consideration of model performance and clinical utility, the LR model was selected as the final pragmatic model. Eventually, a nomogram was established for convenient   individualized risk stratification and clinical decision-making in patients with ASUC. The application of artificial intelligence in medicine has been a hot topic recently. Nguyen et al. [20] comprehensively analysed 13 ML-based prediction models in the field of IBD and concluded that ML models generally perform better than traditional statistical models. However, the flaws in these models lie in the lack of external validation and clinical applicability.
Christodoulou et al. [19] identified 282 comparisons between LR and ML models and found that there was no significant difference in performance for comparisons at low risk of bias. The conflict and our results can be interpreted from the following aspects. First, ML models tend to perform well in problems with a strong signal-to-noise ratio [28], but clinical prediction problems usually have a low signal-to-noise ratio. Second, class imbalance is a common problem in ML model development, but adjusting class imbalance could distort prevalence, which is not appropriate in clinical scenarios [29]. Finally, as suggested by previous research, compared with LR models, ML models need more data to achieve an ideal prediction [30,31]. Indeed, for high-dimensional data, such as multiomics and imaging data, ML-based approaches may be good choices. However, in the scenario of this study, the LR model was better. It does not negate the usability of ML-based approaches; with the rapid growth of clinical data, ML-based models still hold great potential for medical application and further exploration is warranted.
The UCEIS score and CRP level at Day 3 of IVCS treatment were demonstrated to be two crucial predictors of IVCS resistance in patients with ASUC. The UCEIS score and CRP level were identified as the only two independent predictors in the multivariate analysis. These two factors also had the highest importance in the ML-based models. Our findings regarding the UCEIS score indicated that it outperformed the Mayo score in risk stratification and prediction. We also found that ASUC patients who were eventually resistant to corticosteroids and  received colectomy were more likely to have luminal bleeding (2-3 pts) and deep ulcers (3 pts). These findings were consistent with previous studies supporting the predictive value of UCEIS in predicting steroid failure [13,32,33]. Therefore, we suggest careful and detailed examination and evaluation of endoscopic characteristics during the hospital index of patients with ASUC. Regarding the CRP levels, our results were consistent with previous studies showing that the CRP level at Day 3 of IVCS treatment was predictive of steroid failure [10,17,18,34]. Generally, the importance of inflammatory marker monitoring in the early stage of IVCS treatment should be emphasized. The study has several limitations. First, the small sample size from a single center makes it difficult to inform the effects of some factors. However, our sample size is comparable to the sample size in the original study of Travis and Ho scores [10,17], which are widely accepted. In addition, we included an independent validation cohort that helped determine the reproducibility and generalizability in different patients [35]. Nevertheless, the IVCS resistance rates in the validation cohort were slightly higher and the colectomy rates were lower than those in the derivation cohort, which may induce miscalibration. In the future, the model can be modified by adjusting the baseline hazard or model intercept to better suit the average outcome risk in a larger external population [35]. Second, endoscopic features and laboratory data are currently routinely involved in the diagnosis and decision-making of clinicians, which could introduce circularity of argument, an inevitable inherent issue in data interpretation, and a prospective study may help in the future [27]. Moreover, in addition to endoscopic features and laboratory data, many novel predictors for IVCS response in UC patients have been proposed, including fecal calprotectin levels [36,37] and histological indices [38][39][40]. These factors were not recorded in our study due to restrictions of objective conditions. The predictive value of these factors and a composite score incorporating these factors need to be assessed in the future.
In conclusion, this study has developed and validated novel models to predict IVCS resistance in patients with ASUC. The  UCEIS score and CRP levels at Day 3 of IVCS treatment were defined as factors that dramatically influenced the short-term outcome. ML did not show better performance than the traditional LR model in this scenario. The pragmatic model might help in the management of patients with ASUC.

Supplementary Data
Supplementary data is available at Gastroenterology Report online.